Picture for Prithviraj Ammanabrolu

Prithviraj Ammanabrolu

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Add code
May 19, 2026
Viaarxiv icon

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Add code
May 11, 2026
Viaarxiv icon

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

Add code
Apr 06, 2026
Viaarxiv icon

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Add code
Jan 30, 2026
Viaarxiv icon

Preference-Based Learning in Audio Applications: A Systematic Analysis

Add code
Nov 17, 2025
Figure 1 for Preference-Based Learning in Audio Applications: A Systematic Analysis
Figure 2 for Preference-Based Learning in Audio Applications: A Systematic Analysis
Figure 3 for Preference-Based Learning in Audio Applications: A Systematic Analysis
Figure 4 for Preference-Based Learning in Audio Applications: A Systematic Analysis
Viaarxiv icon

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

Add code
Nov 07, 2025
Viaarxiv icon

Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning

Add code
May 22, 2025
Viaarxiv icon

Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning

Add code
Apr 24, 2025
Viaarxiv icon

TALES: Text Adventure Learning Environment Suite

Add code
Apr 22, 2025
Viaarxiv icon

In-context Ranking Preference Optimization

Add code
Apr 21, 2025
Figure 1 for In-context Ranking Preference Optimization
Figure 2 for In-context Ranking Preference Optimization
Figure 3 for In-context Ranking Preference Optimization
Figure 4 for In-context Ranking Preference Optimization
Viaarxiv icon